1 Abstract

Adelaide’s population is increasing, resulting in higher road congestion and its associated costs. As the residents of Adelaide are becoming more reliant on public transport, it is critical to quantify the resilience of the public bus network to the aforementioned rise in road congestion. The analyses contained in this report seeks to provide the South Australia Department for Infrastructure and Transport (DIT) such an estimate by examining the relationship between the travel times of both motor vehicles and bus trips, especially during morning and evening rush hours. The core objective of the analysis is to determine for a specified road segment: the extent to which the variation of motor vehicle travel times from normal relative to the same time frame, is matched with such a variation in bus travel times, where a lower correlation indicates a more resilient bus network to increasing road congestion. Bus travel times are calculated using trip updates data obtained from the open source General Transit Feed Specification Realtime (GTFSR) by calculating the time taken between the first and last stops of the trips on the segment, while the motor vehicle travel times are calculated using links metrics available through DIT’s Addinsight data lake. Upon completing the descriptive analysis of the travel times relationship, the travel times were standardized relative to the time of day to better examine the response of the bus travel times to variations in the motor vehicle travel times. The analysis was performed on South Road for the period of March 2022, for both the northbound direction towards the city, and the southbound direction away from the city. The analysis shows that the evening southbound correlation is strong between the standardized travel times, however the variations are relatively small, therefore it is difficult to infer the bus transportation robustness to congestion as larger variation magnitudes would have to be observed and examined. It also finds that the morning standardized travel times towards the city are more varied and the correlation is lower, implying the bus transportation is relatively more robust to congestion than evening bus travel from the city. The analysis was also performed on Marion Road to compare with South Road. The analysis shows that motor vehicle travel times do not vary by a great degree, therefore a conclusion on the bus network resilience to road congestion on Marion Road cannot be substantiated.


2 Background

Adelaide’s population increased from 1.1 million to 1.3 million residents between 2006 and 2016, with 66 million more kilometers traveled on the road network during that time. Infrastructure Australia paints a dire picture of the level of road congestion in Adelaide and its continued worsening in the coming years in line with both an increasing population and an increasing reliance on public transport in comparison to cars. The report estimated the annualized cost of road congestion for Greater Adelaide to be approximately $1.4 billion in 2016 and is projected to rise to $2.6 billion in 2031 (Infrastructure Australia, 2019).

With this backdrop in mind, the client - DIT - has in its possession an untapped wealth of data relating to traffic information collected through Bluetooth probes, which take count of passing motor vehicles in a particular time and location, therefore producing a metric for road congestion.

This data will be examined in conjunction with publicly available, historical real time bus trip updates collected by GTFSR, which provide the predicted arrival time for each stop on a bus’s trip. The analysis aims to identify the relationship and robustness of bus travel times to road congestion on road segments of interest, especially during peak times.


3 Objectives

The aim of the proposed analysis is to investigate the extent of the relationship between bus travel times and road congestion - as measured by motor vehicle travel times - on identified road segments, where a strong relationship indicates a road segment where the bus travel times are less robust to congestion. Bus travel times are calculated as the time taken between the first and last stops of a segment.

The analysis aims to fulfill the following objectives:

  1. Detailed travel time or congestion analysis comparing public transport response to road traffic on selected sections of road over a given period of time, especially during peak hours

  2. Repeatable methodology, code, functions, and visuals that produce detailed analysis on other segments of interest

In fulfilling the first objective, the first segment of road analysed is South Road in Adelaide. The period of time chosen is March 2022. The second segment of road analysed is Marion Road during the same period of time.

Regarding the second objective, the methodology and the code created aim to ensure as little manual input and edits as possible when applied to different road segments.

The analysis undertaken in this report will form the basis of future analysis into:


4 Data Sources, Description, and Wrangling

Three main data sources are used: DIT Addinsight, General Transit Feed Specification (GTFS), and GTFSR. These data sources and their associated sub-sources will be outlined below.

The methodology will be illustrated on South Road, which is one of Adelaide’s most important and major roads, and regularly suffers from congestion (Infrastructure Australia, 2019).


South Rd on map. Source: Google Maps

Figure 4.1: South Rd on map. Source: Google Maps


4.1 GTFS

This is a common format developed by Google and used by public transport agencies around the world and contains static or scheduled information about public transport services such as routes, stops, schedule and geographic transit information. For the purposes of this analysis, only the bus routes and bus stops datasets will be used.

4.1.1 Routes

These are the bus routes that go through South Road. The routes were identified by overlaying all the network routes on a map in Tableau and the routes on South Road were manually highlighted and exported to a list. The dataset simply contains the unique collection of route_ids on the segment.

4.1.2 Stops

The list of bus stops on the segment were identified using Tableau in the same fashion as when identifying the routes.

A dataset containing all the stops in the bus network and information relating to each stop is used and filtered to only the stops present on the segment.

Table 4.1: Stops data description
Variable Description
stop_id Unique stop identifier
stop_name Name of the location. Uses a name that people will understand
stop_desc Address of the stop
stop_lat Latitude of the stop
stop_lon Longitude of the stop
direction Road direction of the stop

The direction variable is manually created. In this case, if the stop is on the east side of South Road, then it is southbound (SB) away from the city; if the stop is on the west side of South Road, then it is northbound (NB) towards the city.

The bus stops will be plotted on a map to confirm they are all, in fact, on South Road.


Figure 4.2: Bus stops on South Road


4.2 GTFSR - Trip Updates

Unlike GTFS which provides static information, GTFSR provides real time information consisting of two types. The first type is a trip’s real time updates regarding a bus stop’s expected arrival times and delays. The second type is a real time update of a bus’s geographic position and speed at a specific point in time. This analysis uses the former only.

Once the bus routes that go through the segment were identified as outlined above, the real time updates for all the trips in March 2022 according to the routes were retrieved from the AWS database using Athena. This dataset is used to derive the bus travel time through the segment, which is the first element in the relationship being assessed in this analysis, with the other being the vehicle travel time as a measure of congestion.

First, the unedited data will be described.

Table 4.2: Unedited updates data description
Variable Description
route_id Unique route identifier
start_date Start date of the trip
vehicle_id Unique vehicle identifier
timestamp Timestamp of the real time update
trip_id Unique trip identifier
stop_sequence Order of stops for a particular trip
stop_id Unique stop identifier
delay The current schedule deviation for the trip. The delay (in seconds) can be positive (meaning that the vehicle is late) or negative (meaning that the vehicle is ahead of schedule)
arrival_time Predicted arrival time for a stop on a particular trip

It is important to note the following:

  • One route_id can have many trip_ids

  • One trip_id occurs a maximum of one time a day, the trip_id can occur on multiple days

As a bus trip is occurring, at certain time intervals a real time prediction of the arrival times of the upcoming stops on the trip are updated.

The following preliminary adjustments were made:

  1. As each stop on a given trip can have multiple arrival time predictions with each update timestamp prior to reaching that stop, the SQL query insures that each stop only has the predicted arrival time corresponding to the latest update timestamp, given that the later the prediction, the more accurate it is

  2. As a trip can begin and end outside the bounds of the segment, the updates were constrained only to those stops within the segment, in either direction

  3. Weekends and holidays were removed as we are interested in the relationships during working days only

A new variable to_stop_time was created. This variable measures the time taken to reach each stop from the prior stop in seconds, within each trip. The variable was created to facilitate a potential deeper understanding of the data, to highlight any errors, and for potential utilities in the future such as drilling down to examine the patterns on a stop-basis.

Through this variable, a range of errors were discovered that needed to be amended. This is how the data appears before any remedial actions are taken.


Unedited to-stop times contain negative values

Figure 4.3: Unedited to-stop times contain negative values


Figure 4.3 shows that to_stop_time contains negative values to the left of the red line, this is a clear error as it is not possible for the time taken to reach a stop to be negative. Additionally we can see very high delay values in clusters above 70 minutes.

In total, there were eight types of errors identified in the data. The list of errors, an example of each error, and the code to rectify the errors can be found in appendix 8.1.

Each type of error was identified and remedied in a way that does not produce further errors, or that removes large amounts of data; identifying the correct order of the types of errors to be tackled was also essential. This was done to ensure the errors were removed as surgically as possible to minimize data loss and due to the sensitive nature of the relationships between the stops on each trip.

The percentage of error entries located and fixed in the data was 3.83%. The cleaned data now appears as follows:


Cleaned to-stop times do not contain negative values

Figure 4.4: Cleaned to-stop times do not contain negative values


With the data now cleaned, two additional variables were created called first_stop and last_stop, which identify the first and last stops of each trip within the segment. The total time per trip can now be derived by calculating the time between the first stop and last stop of the trip within the segment. The arrival time of the first stop and last stop on the segment will be regarded as the start and end time, respectively, of the trip. The distribution of the trip times per direction is shown below. The two most occurring first-last stops pair per direction are used here.


Different stops pairs in the same direction have different trip times

Figure 4.5: Different stops pairs in the same direction have different trip times


As figure 4.5 shows, different first-last stops pairs in the same direction have different travel times. This means that different trips can have different travel times solely based on their respective first and last stops on the segment, this renders the travel time between them incomparable as they occupy different distances. Therefore, only trips with the same pair of first and last stops within the segment will be kept, with the remaining trips discarded; there can be only one pair of first and last stops per direction, so that the distance is constant for all the trips and the time is therefore comparable.

This pair of stops is identified as the most occurring pair per direction. Now, only trips with this pair of first and last stops are kept in the data. The stops pair per direction can be seen in the map below:


Figure 4.6: Most occurring pair of bus stops per direction


The distribution of the trip times per direction is shown below:


Excessively large trip times exist, especially southbound

Figure 4.7: Excessively large trip times exist, especially southbound


As figure 4.7 shows, excessive trip times occur. It is difficult to determine whether these are errors or genuine trip times without using further information. A variable called delay_diff is created which calculates the size of the difference between the delay of the first stop and the delay of the last stop per trip. Excessive values of this variable indicate the large travel time is due to an error as either of the stops has an artificially large delay or early arrival. A histogram of delay_diff is shown below:


Size of difference between first and last stop delays

Figure 4.8: Size of difference between first and last stop delays


Based on figure 4.7, trips with a delay_diff greater than 10 minutes were removed as they were most likely errors. The resulting data now appears as follows:


Excessively large trip times no longer exist

Figure 4.9: Excessively large trip times no longer exist


The data in the bus travel times will be split into five minute time periods, with the arrival time of the first stop on the trip used as the basis for this segregation. For example, all bus trips that start between 2022-03-01 12:00:00 and 2022-03-01 12:05:00 will be included in the same time frame. Since each time frame can contain multiple trips, the bus travel times will be averaged into one average bus travel time, this is done to establish a one-to-one relationship with the vehicles travel time, which are also in five minute intervals. The final dataset looks as follows:

Table 4.3: Aggregated bus trip travel times data description
Variable Description
day Date of measurement
time Time of the day in hour:minute:seconds of the measurement
hour The hour of the measurement
rush Whether the measurement occurrs during rush hour. Morning rush hour occurrs between 6:30am and 10am, evening rush hour occurrs between 3:30pm and 7pm, neither otherwise
direction The direction of travel
number_buses The original number of trips during the five minute interval
bus_time The bus trip travel time across the segment


4.3 DIT Addinsight

This is traffic information collected by DIT Addinsight, which is done through the use of Bluetooth devices that tag a Bluetooth-equipped vehicle when it comes into its range. The location of a Bluetooth device is called a site, and a link is a segment of road between two sites, an origin site and a destination site. This allows for the calculation of metrics such as the time taken to travel through the link, among others.


4.3.1 Holidays

This dataset contains holidays dates, which is the only variable used.


5 Analysis

5.1 Travel Times Comparison

The travel times from both sets will be compared against one another. This is done to gain a general understanding of the relationship as well as to validate the datasets, as we would expect to observe a similar pattern between both travel times. The comparison will be done through a series of graphs.

Vehicles are faster in both directions. Distributions of both types resemble each other

Figure 5.1: Vehicles are faster in both directions. Distributions of both types resemble each other


From figure 5.1 we learn the following:

  • For northbound travel to the city, vehicle travel time largely remains the same during both periods of rush hour, while bus travel time actually increases in the evening, a surprising result

  • For southbound travel away from the city, both travel times in the evening increase as expected and are more varied than the travel times in the morning

  • The bus travel times are generally slower than vehicle travel times as expected, and both types exhibit similar patterns overall


Northbound bus travel times are slower than vehicle travel times. Both exhibit similar patterns

Figure 5.2: Northbound bus travel times are slower than vehicle travel times. Both exhibit similar patterns


Southbound bus travel times are slower than vehicle travel times. Both exhibit similar patterns

Figure 5.3: Southbound bus travel times are slower than vehicle travel times. Both exhibit similar patterns


Both figures 5.2 and 5.3 show that the travel times from both types generally follow a similar pattern, this indicates the data from both data sets are valid, as we would not expect to see very different patterns. Buses are almost always slower than vehicles, as buses need to load and unload passengers at the various bus stops along the road, in addition to them accelerating at a slower rate since they are heavy vehicles. We also notice that towards the end of the day in both directions, the travel times seem to level off at a low value, this is likely due to less traffic being present on the road in the evening time, leading to a faster traversal through the segment, with only constant factors affecting the travel time such as the speed limit and traffic lights. Note that the y-axis travel times between figures are on independent scales.

To gain a clearer picture of the patterns and relationship throughout an average day, the travel times within 30 minute aggregates of the same time frame will be averaged across all the days. For example, for each travel type, all the measures occurring between 12:00 and 12:30 across all the days will be averaged, then plotted.


Average travel time patterns by vehicle and direction. Peak times are highlighted

Figure 5.4: Average travel time patterns by vehicle and direction. Peak times are highlighted


Figure 5.4 shows the average pattern of travel times across the day, by direction and type. The morning and evening rush hours have been highlighted as they are the parts of the day of interest. Analyzing northbound travel towards the city, both rush hour times display a similar level of travel time for both types, and the travel time in the rush hours are not much greater than non-rush hour times. This is an unexpected result as it is expected that travel times northbound towards the city would be higher in the morning rush hour. Southbound travel away from the city, however, follows expectations as the travel time for both types dramatically increases in the evening rush hour as workers leave the city.

A scatter plot of the travel times will be examined:


A positive relationship exists between both travel times for both directions during peak times

Figure 5.5: A positive relationship exists between both travel times for both directions during peak times


Figure 5.5 shows that a positive relationship exists between the travel times, more so for the southbound direction in the evening.

The correlation figures between the travel times are:

Table 5.1: Correlation between travel times per direction per rush hour
Rush Direction Correlation
Morning NB 0.79
Morning SB 0.49
Evening NB 0.67
Evening SB 0.87

Table 5.1 shows that the travel times between buses and vehicles are highly correlated in the morning northbound towards the city, and in the evening southbound away from the city.


5.2 Travel Times Variation Analysis

The goal of the analysis is to ascertain the extent of the relationship between the variations in the motor vehicle travel times and the variations in the bus trip travel times, the variation is in reference to travel times during the same time frame across the entire period. In other words, if the vehicle travel time varies by a certain level relative to the usual travel time during the same time frame, can we observe a reflection of this variation in the bus travel time? If so, by how much?

In order to assess the variation, the travel times will be standardized. The function standardiser is created which separately standardizes both the bus travel times and the vehicle travel times according to the total data in the entire period based on either:

  • the five minute time frame. For example, a bus/vehicle travel time on 2022-03-01 between 7am and 7:05am would be standardized against all the other travel times that occur between 7am and 7:05am in the period

  • the hour of travel. For example, a bus/vehicle travel time that occurs on 2022-03-01 between 7am and 8am would be standardized against all the other bus trips in the period that occur between 7am and 8am

  • the rush hour of travel. For example, a bus/vehicle travel time that occurs on 2022-03-01 during the morning rush hour would be standardized against all the other bus trips in the period that occur during the morning rush hour

These options are provided to the function as an argument (time, hour, rush). As the time frame widens, more data is available for standardization, but the standardization takes a wider time range, leading to increased bias. This is why in addition to standardizing the data, the standardiser function also stores the total number of data points present in each time frame according to the method chosen. The function also removes observations greater than three standard deviations away as these are considered outliers that can affect the analysis.

Ideally, the travel times would be standardized according to the same five minute time frame across the entire period as this would provide the highest accuracy. However, as the bus trips per five minute time frame were averaged into one five minute travel time, and we are analyzing only one month of data containing 21 working days, there is not enough travel times to accomplish this, since there would be a maximum of 21 data points per five minute time frame used for standardization. Instead, the default standardization parameter is by hour, which provides a much greater number of data points at the cost of some bias.

With the bus and vehicle travel times standardized, we can now examine the relationship between the travel times with respect to variation. If the vehicle travel time deviates from the average relative to the time of day, do we observe a similar deviation by the bus travel time?

Plots of the standardized travel times are shown below:


Northbound morning travel time variations are similar

Figure 5.6: Northbound morning travel time variations are similar


Southbound evening travel time variations are similar

Figure 5.7: Southbound evening travel time variations are similar


Figure 5.6 and figure 5.7 show that variations in vehicle travel times are in fact closely matched by variations in bus travel times.

The distribution of the standardized travel times are shown below:


Greater variation is present in the northbound morning travel times than southbound evening travel times

Figure 5.8: Greater variation is present in the northbound morning travel times than southbound evening travel times


Figure 5.8 shows that travel times within the same time period have relatively greater variation for both types in the morning towards the city, while the travel times within the same time period in the evening away from the city show much less variation, especially vehicles. The variations here are not to be confused with the variations across the entire rush time across the entire period as shown in figure 5.1. As a reminder, the standardization in the plot above is performed relative to travel times in the same hour across the entire period, and is therefore more specific.


A positive relationship exists between both travel times for both directions during peak times

Figure 5.9: A positive relationship exists between both travel times for both directions during peak times


Figure 5.9 shows that the standardized travel times are particularly correlated in the evening southbound away from the city.

The correlation figures between the standardized travel times are :

Table 5.2: Correlation between standardized travel times per direction per rush hour
Rush Direction Correlation
Morning NB 0.65
Morning SB 0.23
Evening NB 0.43
Evening SB 0.74

Table 5.2 shows that strong correlation exists between the standardized travel times during the evening southbound away from the city.


6 Marion Road


Marion Rd on map. Source: Google Maps

Figure 6.1: Marion Rd on map. Source: Google Maps



Travel time distributions are similar across rush hours and directions

Figure 6.2: Travel time distributions are similar across rush hours and directions


From figure 6.2 we learn the travel times distributions are similar across rush hours and directions. This is in comparison to the South Road distributions in figure 5.1 where the travel times of both types are varied between rush hours and directions.


Northbound bus travel times are slower than vehicle travel times. Both exhibit similar patterns

Figure 6.3: Northbound bus travel times are slower than vehicle travel times. Both exhibit similar patterns


Southbound bus travel times are slower than vehicle travel times. Both exhibit similar patterns

Figure 6.4: Southbound bus travel times are slower than vehicle travel times. Both exhibit similar patterns


Figures 6.3 and 6.4 show that both travel types follow a similar travel time pattern.


Average travel time patterns by vehicle and direction. Peak times are highlighted

Figure 6.5: Average travel time patterns by vehicle and direction. Peak times are highlighted


Figure 6.5 shows that both travel types display a small uptick in travel time towards the city during the morning rush hour. In the southbound direction vehicles have a slightly larger travel time during the evening, while buses exhibit an increase in travel time during both rush hours. Figure 5.4 pertaining to South Road shows that the southbound direction in particular has a large increase in both travel times during the evening rush hour.


A positive relationship exists between both travel times during the morning towards the city

Figure 6.6: A positive relationship exists between both travel times during the morning towards the city


Figure 6.6 shows that a positive relationship exists between both travel types during the morning rush hour towards the city. While a relationship does not seem present during the evening rush hour away from the city.


Table 6.1: Correlation between travel times per direction per rush hour
Rush Direction Correlation
Morning NB 0.70
Morning SB 0.60
Evening NB 0.34
Evening SB 0.39


Table 6.1 displays the correlation figures between the travel times. Only the travel times during the morning towards the city show a relatively strong correlation.


Northbound morning travel time variations seem moderately correlated

Figure 6.7: Northbound morning travel time variations seem moderately correlated


Southbound evening travel time variations are not similar

Figure 6.8: Southbound evening travel time variations are not similar


Figures 6.7 and 6.8 show that the variations in vehicle travel times are not closely matched by variations in bus travel times. This is in contrast to South Road where figures 5.6 and 5.7 show a close relationship between the variations in travel times of both types.


The extent of the variability in travel times is similar across the rush hours and travel directions

Figure 6.9: The extent of the variability in travel times is similar across the rush hours and travel directions


Figure 6.9 shows that travel times within the same time period generally possess the same level of variation across the rush hour periods and travel directions.


No strong relationships are present

Figure 6.10: No strong relationships are present


Table 6.2: Correlation between standardized travel times per direction per rush hour
Rush Direction Correlation
Morning NB 0.53
Morning SB 0.18
Evening NB 0.15
Evening SB 0.23


Figure 6.10 and table 6.2 show that a slight relationship exists between the standardized travel times during the morning rush hour towards the city, and no relationship exists otherwise. South Road, on the other hand, displays a positive relationship between the standardized travel times during the morning towards the city and during the evening away from the city, as shown in 5.9 and 5.2.


7 Conclusion, Findings, and Future Directions

The report examined the relationship between the bus travel times and motor vehicle travel times, and the robustness of the bus network to road congestion, particularly during rush hour periods. A limitation of the data is that the trip updates dataset provides the predicted arrival times of the bus stops, whereas the actual arrival times can provide greater accuracy.

An additional objective of the project was to create the analysis code in such a way as to allow reproducibility with minimal manual input across different road segments and time periods, therefore the analysis was conducted on an additional road in Adelaide (Marion Road) to enable a comparison with South Road.

The results from the analysis on South Road show the following:

The results from the analysis on Marion Road show the following:

The differing results between both roads may be a consequence of the fact that South Road is much more widely used by commuters to travel to and from the city, hence any conclusion regarding bus network resilience to congestion on South Road carries more weight for decision making.

Possible future directions stemming from this analysis include:


8 Appendix

8.1 Bus Trips Updates Errors

  1. All stops for the trip have very large, or very small, similar delays. This means that the entire trip is very delayed or very early. This is most likely due to errors when retrospectively entering the information at a later time. By examining 4.3, the threshold was set at 2,400 seconds (40 minutes) delay and 900 seconds (15 minutes) early. These trips were removed to prevent incorrect analysis since they will be in the wrong time period. These trips were removed as they will be in the wrong timeframe when analyzed against the vehicle travel time.


  1. A stop has a sudden large predicted delay resulting in a much larger arrival time than that of the following stop. These stops were removed


  1. A stop has an arrival time later than any following stops and the timestamp is earlier than any following stops. These stops were removed to ensure the most recent timestamp is preferred when discrepancy occurs


  1. A stop’s arrival time is earlier than previous stops and the timestamp is older


  1. A stop’s arrival time is earlier than prior stops but they both have the same timestamp. In this case it is not possible to know which is correct. We assume the stop with the earlier stop sequence is correct since it is closer to the bus when the update is made


  1. Two consecutive stops have identical arrival times but with different timestamps. The stop with the older timestamp was removed


  1. Two consecutive stops have identical arrival times and timestamps. Remove the stop with a higher stop sequence


  1. Many stops on the same trip have the same arrival time likely due to retrospective entry error. These trips were removed


9 References